Feature Selection by Approximating the Markov Blanket in a Kernel-Induced Space

نویسندگان

  • Qiang Lou
  • Zoran Obradovic
چکیده

The proposed feature selection method aims to find a minimum subset of the most informative variables for classification/regression by efficiently approximating the Markov Blanket which is a set of variables that can shield a certain variable from the target. Instead of relying on the conditional independence test or network structure learning, the new method uses Hilbert-Schmidt Independence criterion as a measure of dependence among variables in a kernel-induced space. This allows effective approximation of the Markov Blanket that consists of multiple dependent features rather than being limited to a single feature. In addition, the new method can remove both irrelevant and redundant features at the same time. This method for discovering theMarkov Blanket is applicable to both discrete and continuous variables, whereas previous methods cannot be used directly for continuous features and therefore are not applicable to regression problems. Experimental evaluations on synthetic and benchmark classification and regression datasets provide evidence that the new feature selection method can remove useless variables in low and in high dimensional problems more accurately than existing Markov Blanket based alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov Blanket Ranking using Kernel-based Conditional Dependence Measures

Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorit...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Markov Blanket Feature Selection for Support Vector Machines

Based on Information Theory, optimal feature selection should be carried out by searching Markov blankets. In this paper, we formally analyze the current Markov blanket discovery approach for support vector machines and propose to discover Markov blankets by performing a fast heuristic Bayesian network structure learning. We give a sufficient condition that our approach will improve the perform...

متن کامل

A Unified View of Causal and Non-causal Feature Selection

In this paper, we unify causal and non-causal feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different...

متن کامل

Feature selection for high-dimensional genomic microarray data

We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. Our approach is a hybrid of filter and wrapper approaches to feature selection. We make use of a sequence of simple filters, culminating in Koller and Sahami’s (1996) Markov Blanket filter, to decide on particular featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010